Growing Scale-Free Networks with Tunable Clustering 
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We extend the standard scale-free network model to include a "triad formation step" . We analyze 
the geometric properties of networks generated by this algorithm both analytically and by numerical 
calculations, and find that our model possesses the same characteristics as the standard scale-free 
networks like the power-law degree distribution and the small average geodesic length, but with 
the high-clustering at the same time. In our model, the clustering coefficient is also shown to be 
tunable simply by changing a control parameter — the average number of triad formation trials per 
time step. 
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A great number of systems in many branches of sci- 
ence can be modeled as large sparse graphs, sharing 
many geometrical properties For example: social 
networks, computer networks, and metabolic networks 
of certain organisms all have a logarithmically growing 
average geodesic (shortest path) length £ and an approx- 
imately algebraically decaying distribution of vertex de- 
gree. In addition to this, social networks typically show 
a high clustering, or local transitivity: If person A knows 
B and C, then B and C are likely to know each other. 

Works on the geometry of social networks, which is the 
main focus of the present paper, have originated from 
Rapoport's studies of disease spreading and have 
been further developed in Refs. || ||. General mathe- 
matical models for random graphs with a structural bias 
are called the Markov graphs and were studied in Ref. g. 
In the physics literature, networks with high cluster- 
ing are commonly modeled by the small-world network 
model of Watts and Strogatz (WS) ||, while networks 
with the power-law degree distribution by the scale-free 
network model of Barabasi and Albert (BA) Qj. Al- 
though both models have a logarithmically increasing i 
with the network size, each model lacks the property of 
the other model: the WS model shows a high cluster- 
ing but without the power-law degree distribution, while 
the BA model with the scale-free nature does not possess 
the high clustering. In this work, we propose a network 
model which has both the perfect power-law degree dis- 
tribution and the high clustering. Furthermore, in our 
model, the degree of the clustering, measured by the clus- 
tering coefficient (see below) , is shown to be tunable and 
thus controllable by adjusting a parameter of the model. 

We start from the definition of a network as a graph 
Q = (V,£), where V is the set of vertices and £ is the 
set of edges ^ . An edge connects pairs of vertices in V 
and not more than one edge may connect a specific pair 
of vertices. To quantify the clustering, Watts and Stro- 
gatz introduced the clustering coefficient 7 = (7^) with 
the average ( • • • ) for all vertices in V. The local clus- 



tering coefficient j v for the vertex v is defined as follows: 
Suppose that the vertex v has k v neighbors (k v is called 
the degree of the vertex v, a neighbor is a vertex sepa- 
rated by exactly one edge). For those k v neighbors, there 

can exist at most ( = k v (k v — l)/2 edges connecting 

two of k v vertices. If one defines |£(r„)| as the number 
of actual edges existing in the network connecting those 
neighbors, the local clustering coefficient is written as H 



|g(r„)| 
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^From the above definition, it is clear that 7 is a mea- 
sure of the relative number of triads (fully connected sub- 
graphs of three vertices). Note also that 7 is strictly in 
the interval [0, 1] with the upper limit attained only for a 
fully connected graph. In a social acquaintance network, 
for example, 7 = 1 if everyone in the network knows 
each other. It should be noted that even though the BA 
model successfully explains the scale-free nature of many 
networks, it has 7 « and thus fails to describe cor- 
rectly networks with the high clustering, such as social 
networks. 

We below review briefly the BA model of the scale- free 
network and present our model for the scale-free network 
with the high clustering. The BA model H is defined as 
follows: 

• Initial condition: To start with, the network con- 
sists of mo vertices and no edges. 

• Growth: One vertex v with m edges is added at 
every time step. Time t is identified as the number 
of time steps. 

• Preferential attachment (PA): Each edge of v is 
then attached to an existing vertex with the proba- 
bility proportional to its degree, i.e. the probability 
for a vertex w to be attached to v is li~5t] 
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FIG. 1: Preferential attachment and triad formation. In the 
preferential attachment step (a) the new vertex v chooses a 
vertex u to attach to with a probability proportional to its 
degree. In the triad formation step (b) the new vertex v 
chooses a vertex w in the neighborhood of the one linked to 
in the previous preferential attachment step, x symbolizes 
"not-allowed to attach to" (either since no triad would be 
formed, or that an edge already exists) . 



In the BA model, the growth step is then iterated TV = 
| V| times, and for each growth step the PA step is iterated 
m times for m edges of the newly added vertex v. 

In order to incorporate the high clustering we modify 
the above BA algorithm by adding an additional step: 

• Triad formation (TF): If an edge between v and w 
was added in the previous PA step, then add one 
more edge from v to a randomly chosen neighbor 
of to. If there remains no pair to connect, i.e., if all 
neighbors of w were already connected to v, do a 
PA step instead. 

When a vertex v with m edges is added to the existing 
network, we first perform one PA step, and then perform 
a TF step with the probability P t or a PA step with the 
probability 1 — Pt. The average number mt of the TF 
trials per added vertex is then given by mt = (m — 1)-Pt, 
which we take as the control parameter in our model (see 
Fig. |l|). It should be noted that our model reduces to the 
original BA model when mt = 0. 

The standard scale free network model not only gen- 
erates networks with certain geometrical properties, it 
suggests a mechanism for the emergence of power-law 
degree distributions in evolving networks: New actors 
(vertices) in a social context prefers to attach to more 
connected ("well known") actors. The sociological inter- 
pretation for the triad formation step is that after being 
acquainted with (linked to) w an actor v is likely to be 
acquainted to u>'s acquaintances as well. This mecha- 
nism of the emergence of clustering is well-known, and 
was discussed under the name "sibling bias" already in 
Ref. 0. Recently, Ref. jj) provided empirical evidence 
for both the mechanisms of triad formation and prefer- 
ential attachment used in our construction algorithm. 

The clustered scale-free network algorithm defined 
above gives the same degree distribution as the standard 
scale free network, at least if every TF step follows a 
PA step. To see this, first observe that in a PA step an 
arbitrary vertex v increases its degree with the rate 
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FIG. 2: Degree distribution for the scale-free network model 
with tunable clustering with parameter values m — mo — 3, 
N = 10 5 at various values of m t : At any value of m t , which 
determines the average number of triad formations, P(k) ex- 
hibits a power-law behavior like the BA model corresponding 
to m t — 0. 



where the normalization factor A for one edge is deter- 
mined to be unity following Ref. [Q. For a TF step the 
average increase of k v is proportional to the probability 
that a vertex in the neighborhood w is linked in the PA 
step before, times the inverse of that vertex's degree (the 
probability that v is linked from w): 
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for a TF step, 
(4) 



where we have used the same normalization as in Eq. (j^) 
and r„ is the neighborhood of v (we use that the number 
of vertices in r„ is k v ). From Eqs. (|[) and (0) the total 
rate for one time step, composed of mt TF steps and 
m — m t PA steps, is expressed as 
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which has the same form as the original BA model and 
thus results in 



1/2 



(6) 



Consequently, the degree of an arbitrary vertex increases 
as the square root of the time, which then yield the 
power-law degree distribution: P{k) ~ fc~ 3 0. 

In the above discussion we have assumed that a TF 
step always follows a PA step. If a TF step would be 
proceeded by another TF step the factor k w (l/k w ) in 
Eq. (0) would be replaced by k w (l/(k w — 1)) which is a 
small correction when k w is large (which it is likely to be 
by the definition of the PA step) . And thus the resulting 
degree distribution would not differ much from a power- 
law. In Fig. @, the degree distributions P(k) at various 
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FIG. 3: (a) Clustering coefficient 7 versus the network size N 
at various values of the average number mt of triads per time 
step. Straight lines show asymptotic values of 7 at each mt- 
For m t 7^ 0, 7 approaches a nonzero value as N is increased, 
(b) y(N — > 00) versus mt: The clustering coefficient can be 
varied systematically by changing m t . 



values of m t are displayed and we find that at any value 
of m t , the distribution is well described by the power law 
with the exponent a « 3 in P(k) ~ fc _a , as is expected 
from the above analytic consideration. 

The parameter m t in our model introduces the clus- 
tering effect into the system by allowing the formation of 
triads. We only focus on the case of m — 3 with expecta- 
tion that other values of m should give qualitatively the 
same behavior. One expects then that for any mo a finite 
mt gives a finite clustering coefficient 7 in the thermo- 
dynamic limit of N — > 00, whereas for m t — (the BA 
scale-free network model) 7 goes to zero as N becomes 
larger. In Fig. ||(a), 7 at various values of m t is shown as 
a function of system size N. As expected, we find that 7 
approaches to a finite nonzero value as N is increased at 
nonzero m t , whereas the BA model, which corresponds 
to the limiting case of mt = in our model, is confirmed 
to to have 7 = 0. Furthermore, we also observe that the 
relation between mt and 7 is almost linear, as depicted 
in Fig. |(b). 

^From the above observations, we conclude that our 
model exhibits both the scale-free nature and the high- 
clustering at the same time, while the WS model (the 
BA model) lacks the former (the latter) property. We 
note that in many real networks, both properties usually 
coexist, and thus believe that our model is more realistic. 
The triad formation step in our model, which inevitably 
gives a high clustering coefficient, is expected to make 




FIG. 4: The characteristic path length for the arbitrary clus- 
tered scale-free network model with the parameters m = 
mo = 3 and at various values of m t . Although I becomes 
larger with m t , £ is found to behave logarithmically as a func- 
tion of N. 



the average geodesic length smaller than the BA network, 
since the edge for the triad could have been used to con- 
nect two vertices separated by a large distance if only 
the preferential attachment step was allowed. However, 
the characteristic path length, defined as the average of 
the geodesic length, I, is found to behave logarithmically 
with the size N, the same behavior as the WS model and 
the BA model. In Fig. |I| we present £ versus N at var- 
ious values of m t . It is shown that £ becomes larger as 
mt is increased, as expected. Furthermore Fig. ^ shows 
that the increase of £ is logarithmic for all m t . 

By mimicking principles in network formation, a gener- 
ation algorithm can construct graphs with certain topo- 
logical statistics, such as a degree distribution, clustering 
coefficient, and so on. However, it should be emphasized 
that these kinds of algorithms cannot claim to uniformly 
sample the ensemble of networks with specific statistical 
properties. This drawback exists even in more general 
classes of random graphs where structural biases, such 
as clustering, are imposed. || |To| 

Recently, Klemm and Equiluz |ll[] have proposed a net- 
work model based on a finite memory of vertices, i.e., 
vertices become inactive and do not get new edges after 
a finite number of time steps, and have shown that their 
growth and deactivation model exhibits both the high 
clustering and the scale-free nature. Our model provides 
an alternative possibility to achieve the same feature, the 
clustered scale-free nature, based on our frequent every- 
day experience on how we are acquainted by newcomers: 
B becomes A's new friend since B is introduced by one 
of ^4's friends. Even in the network of scientific cita- 
tions, it is likely that authors of paper A refer paper B 
since they have found B when they read a famous re- 
view paper C Jl^ ]. This then has close resemblance to 
our model, the TF step accompanied by the PA step. In 
Ref. [pi, a model with both the high clustering and the 
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scale-free distribution has also been suggested. However, 
the power-law degree distribution was assigned to the 
network to start with, and the next following steps were 
devised not to change the degree at each vertex. In other 
words, the power-law distribution in Ref. fll3| was not an 
emerging property in the model, which is different from 
the BA model as well as our model in this work. Very 
recently, we have learned about the work by Davidsen et 
al. |l4| , which is based on the same observation of triad 
formation as ours and has been shown to possess similar 
network properties, i.e. the high clustering, small average 
geodesic length, and a scale-free distribution. We believe, 
however, that our model has some advantage in describ- 
ing networks which grow in time, whereas the network 
model in Ref. has fixed network size. 

In conclusion, we have proposed an algorithm for gen- 
eration of growing networks with power-law degree dis- 



tribution, a logarithmic increase of the average geodesic 
length, and a finite clustering. The last two properties 
make the generated graphs qualify as a small-world net- 
work in the Watts and Strogatz sense, in addition to 
their scale-freeness. The simple relation between the co- 
efficient mt and 7 further increases the usefulness of the 
suggested algorithm, making it possible to tune the clus- 
tering coefficient in a systematic way. 
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